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Abstract 

The detection of triadic subgraph motifs is a common methodology in complex- 
networks research. The procedure usually applied in order to detect motifs eval¬ 
uates whether a certain subgraph pattern is overrepresented in a network as a 
whole. However, motifs do not necessarily appear frequently in every region of a 
graph. For this reason, we recently introduced the framework of Node-Specific 
Pattern Mining (NoSPaM) [9, 11]. This work is a manual for an implementation 
of NoSPaM which can be downloaded from www.mwinkler.eu. 
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Figure 1: All 13 possible non-isomorphic connected triadic subgraphs (subgraph patterns) 
in directed unweighted networks. 


1 Introduction 


Analyzing networks in terms of their local substructure is a well-established methodol¬ 
ogy in complex-network science [1, 3, 4, 6-8]. Particularly triadic subgraph structures 
have been studied intensively over the last 15 years [2, 4, 6, 10]. Apart from node 
permutations, there are 13 connected triadic subgraph patterns in directed unweighted 
networks (see Fig. 1). Those patterns that are signihcantly overrepresented in a graph 
structure are referred to as motifs [6]. 

In order to evaluate the extent of an over- or underrepresentation, for each pattern 
i, the framework commonly used compares its frequency of occurrence in the original 
network under investigation. Aboriginal,i, to the expected frequency of occurrence in an 
ensemble of random networks with the same degree distribution and the same number 
of unidirectional and bidirectional links as the original network, (Arand.i)- Over- and 
underrepresentation of pattern i is then quantihed through a Z score 


= 


Aboriginal,i (Ab'and,i) 
^rand,2 


( 1 ) 


where o-rand,i represents the standard deviation of Nrand,i in the ensemble of the null 
model. Hence, every network can be assigned a vector Z whose components comprise 
the Z scores of all possible triad patterns of Fig. 1. 

However, this approach does not account for potentially existing heterogeneities in 

graph structures. Suppose, e.g., the feed-forward loop (FFL), is overrepresented in a 
certain area of a graph, but, at the same time, highly underrepresented in another part of 
the graph. On the global network level, the effects may cancel out such that the Z score 
will be close to zero and possibly relevant structural information will be lost. Therefore, 
we recently suggested the methodology of Node-Specihc Pattern Mining (NoSPaM) [9, 
11]. Instead of mining frequent subgraphs on the system level, NoSPaM investigates 
the neighborhood of every single node separately, he., for every node a, NoSPaM 
considers only those triads in which a participates in. Since the position of node a in 
the triadic subgraphs matters now, the symmetry of most patterns shown in Fig. 1 is 
broken and the number of connected node-specific triad patterns increases from 13 to 
30. These are shown in Fig. 2. To understand the increase in the number of patterns, 

consider the ordinary subgraph From the perspective of one particular node, it splits 
into the three node-specific triad patterns 14, 16, and 23 in Fig. 2. Furthermore, some 
patterns are included in others, e.g. pattern 1 is a subset of pattern 3. In order to avoid 
biased results, it is not double counted, i.e. an observation of pattern 3 will only increase 
its corresponding count and not the one associated with pattern 1 [9, 11]. 
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Figure 2: All possible connected, nonisomorphic triadic subgraph patterns in terms of a 
distinct node (here: lower node). 


For every node a in a graph, NoSPaM will compute Z scores for each of the 30 
node-specihc patterns i shown in Fig. 2, 


= 


N: 


original,z 


n: 


rand,2 


(t: 


rand,2 


( 2 ) 


"^original,i the number of appearances of pattern i in the triads node a participates in. 
Accordingly, is the expected frequency of pattern i in the triads node a is part of 

in the ensemble of graphs with the same degree distribution and the same number of both 
unidirectional and bidirectional links, cr" i is the corresponding standard deviation. 

The following section of this manual provides for information on how to download and 
run an implementation of NoSPaM. In Section 3, details of the applied algorithms are 
discussed. 
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2 How to use NoSPaM 

2.1 Download and Installation 

• Requirements: make sure to have a recent Java version installed (version 1.7 or 
higher) 

• Download: NoSPaM can be downloaded from www.mwinkler.eu following the 
Supplementary Material link. 

• Extract the nospam.zip hie to whereever you prefer to store NoSPaM. 

• Go to the command line terminal and navigate to the nospam directory. 

• Generate executable class hies by typing: j avac * . j ava 

• *In case of problems with the compilation process make sure that the PATH vari¬ 
able includes the JDK, e.g. 'C:\Program Files\Java\jdkl.7.0_ll\bin' 

• Start a test run by typing: java nospam exampleNetwork.txt 2000 1500 

2.2 Running NoSPaM 

Input data must be stored in the followig format: 

<source node ixtabxtarget node 1> 

<source node 2xtabxtarget node 2> 

<source node Mxtabxtarget node M> 

Every line represents an edge of the network. The identities of the node must be repre¬ 
sented by integer values. Any additional entries in a line (e.g. an edge weight) will be 
ignored by the algorithm. The hie type is arbitrary and can be, e.g., .txt, .dat, .csv, 

etc... 

The general command line syntax is as follows: 
java nospam <filePath> <samples> <switching attempts> 

• if the network hie to be analyzed is already in the nospam directory (e.g. the hie 
exampleNetwork.txt), the variable <filePath> can simply be set to the value 
<fileName>.<fileType> 

• the variable <samples> specihes the number of instances from the randomized 
ensemble to be used for estimating the average values, <^A^rand,i)) the standard 
deviations, 

• the variable <switching attempts> specihes the number of microscopic rewiring 
steps. It should be chosen proportionally to the number of edges, \E\, in the graph 
and not smaller than \E\ (see Section 3 and References [5] and [9] for details). 
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|2000 iterations, 1500 link-switch attempts per iteration- 
Rows indicate in periodic order: 

- number of occurrences in the real network 

- average number of occurrences in the ensemble 

- standard deviation in the ensemble 


- nodes specific 2-scores in the ensemble 

The order of the triad patterns can be found in the manual: NoSPaM Manual - A Tool for Node-Specific Triad Pattern Mining, 
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Figure 3: Example output file. The first column indicates the ids of the nodes. The 
following 30 columns correspond to the node-specific patterns shown in Fig. 2 (in the same 
order). For every node, there are four lines: the first one indicates the total number of 
occurrences in the original network, ./VTginaH) second line indicates the mean number 

of occurrences in the randomized ensemble, third line indicates the standard 

deviation in the randomized ensemble, the fourth line indicates the node-specific 

Z scores which can be obtained from the other three measures. 


2.3 Output File 

The output of NoSPaM is written in a separate file of the same type as the input file. 
Fig. 3 gives an example of such an output file. After the header lines, the evaluated data 
is stored. For each node (indicated by the ids in the first column), the values of 

("^rand,i )5 '^rand.i! are stored in the columns corresponding to the patterns i in 

Fig. 2. 

For the output hie shown in Fig. 3, e.g., from the perspective of the node with id 
1, pattern 1 (V") has appeared 16 times in the network under investigation. In the 
randomized ensemble it appeared 15.42 times on average with a standard deviation of 
2.15 resulting in a Z score, Z^^ of 0.27 indicating a non-signihcant overrepresent at ion 

of the pattern. Further, pattern 2 (V") has appeard 5 times in the neighborhood of node 
1, while in the randomized ensemble it occurred 10.39 times on average with a standard 
deviation of 3.58 yielding a Z score of Z^^ = —1.51 indicating an underrepresentation 
of the pattern. 
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Figure 4: Microscopic link-switchings performed to generate the randomized ensembles, 
(a) Pair switch and loop switch for unidirectional links. (b) Pair switch for bidirectional 
links. 


3 Algorithms 

3.1 Randomization 

The generation of the random null model with the same degree distribution and the 
same number of unidirectional and bidirectional links adjacent to each vertex as in the 
network under investigation is realized by a link-swapping algorithm. The microscopic 
switching rules are illustrated in Fig. 4. 

The number of microscopic switching attempts is the only parameter of the random¬ 
ization that needs to be specihed in advance. The entire randomization procedure is 
displayed in Algorithm 1. The Markov chain generated by successive link swappings 
obeys detailed balance and coverges towards a uniform distribution of the networks in 
the ensemble serving as the null model, i.e. every valid network is sampled with equal 
probability. This fact is elaborated in depth in [9]. The number of switching attempts 
should be chosen proportionally to the number of edges in the graph [5]. 

3.2 Counting of Node-Specific Triad Patterns 

For counting the frequencies in which the distinct node-specific triad patterns occur, an 
iteration over all connected triads is necessary. The procedure is illustrated in Algo¬ 
rithm 2. Because it is computationally expensive to test all triads in the system (the 
complexity is of order 0{N^)), we rather iterate over pairs of adjacent edges in the 
graph. Since real-world networks are usually sparse, this is much more efficient [9]. 
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Algorithm 1 Degree-preserving randomization of a graph 

function RANDOMlZE(Graph Q {V,E), no. of required steps) 
s = 0 

while s < number of required rewiring steps do 
pick a random link ei E E 
if Cl is unidirectional then 

pick a 2nd unidirectional link 62 & E ai random 
else 

pick a 2 nd bidirectional link 62 G i? at random 

end if 

if Cl and 62 do not share a node then 

rewire according to the pair-switch rules in Fig. 4 
if one of the new links already exists then 
undo the rewiring 

end if 

else if Cl and 62 participate in a loop then 

rewire according to the loop-switch rule in Fig.4(a) 

end if 

S+ + 

end while 

retnrn randomized instance of G 

end function 


Algorithm 2 Counting of node-specihc triad patterns 

function NspPATTERNCoUNTER(Graph G{V,E)) 

M'. N X 30-dimensional array storing the pattern counts for every node of G 
for every edge e E E do 

i,j E- IDs of e’s nodes with i < j 

C ■(— {} be list of candidate nodes to form triad patterns comprising e 
C E- all neighbors of i 
C E- all neighbors of j 

for all c G C do 

if i + j < sum of IDs of all other connected 
dyads in triad {ijc) then 
increase the counts in J\f for i, j, and c for 
their respective node-specihc patterns 

end if 
end for 
end for 
retnrn N" 
end function 
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3.3 Node-Specific Triad Pattern Mining 

The evaluation of the node-specihc Z scores is performed by Algorithm 3. For a more 
detailed discussion of the algorithms including considerations of the compuational effort 
see References [9, 11]. 


Algorithm 3 Node-specihc triad pattern mining (NoSPaM) 


function NoSP aM(G raph ^ required rewiring steps, # randomized instances) 

A/'originai ^ NspPatternCounter(^) 

A/'rand ^ {} 

A/gq^rand ^ "{}■ 

for randomized instances do 

Q ^ Randomize(^, required rewiring steps) 
counts ^ NspPatternCounter(^) 

A/'rand ^ A/'randT COUUtS 

A/lq.rand ^ A/^q^randT COUUtS * COUUtS 

end for 


A/'rand ^ A/'rand /(#randomized instances) 
A/'sq,rand ^ A 4 q,rand/(#randQmized i nstances) 

^rand '^•^^q,rand (A/rand * A/rand) 

Z i (A/original A/rand ) /^rand 


return Z 
end function 
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